Learning Mixtures of Weighted Tree-Unions by Minimizing Description Length
نویسندگان
چکیده
This paper focuses on how to perform the unsupervised clustering of tree structures in an information theoretic setting. We pose the problem of clustering as that of locating a series of archetypes that can be used to represent the variations in tree structure present in the training sample. The archetypes are tree-unions that are formed by merging sets of sample trees, and are attributed with probabilities that measure the node frequency or weight in the training sample. The approach is designed to operate when the correspondences between nodes are unknown and must be inferred as part of the learning process. We show how the tree merging process can be posed as the minimisation of an information theoretic minimum descriptor length criterion. We illustrate the utility of the resulting algorithm on the problem of classifying 2D shapes using a shock graph representation.
منابع مشابه
Learning Mixtures of Tree-Unions by Minimizing Description Length
This paper focuses on how to perform the unsupervised learning of tree structures in an information theoretic setting. The approach is a purely structural one and is designed to work with representations where the correspondences between nodes are not given, but must be inferred from the structure. This is in contrast with other structural learning algorithms where the node-correspondences are ...
متن کاملzoning of flood hazard in Nowshahr city using machine learning models
The aim of this study is to predict and model flood hazard in the city of Nowshahr, Mazandaran province using machine learning models. The criteria and indicators affecting flood hazard were identified based on the review of resources, and then the indicators were converted into rasters in ArcGIS environment, and finally standardized by fuzzy method for use in the models. K-nearest neighbor ...
متن کاملComparison of Artificial Neural Network, Decision Tree and Bayesian Network Models in Regional Flood Frequency Analysis using L-moments and Maximum Likelihood Methods in Karkheh and Karun Watersheds
Proper flood discharge forecasting is significant for the design of hydraulic structures, reducing the risk of failure, and minimizing downstream environmental damage. The objective of this study was to investigate the application of machine learning methods in Regional Flood Frequency Analysis (RFFA). To achieve this goal, 18 physiographic, climatic, lithological, and land use parameters were ...
متن کاملA Mixed Integer Programming Approach to Optimal Feeder Routing for Tree-Based Distribution System: A Case Study
A genetic algorithm is proposed to optimize a tree-structured power distribution network considering optimal cable sizing. For minimizing the total cost of the network, a mixed-integer programming model is presented determining the optimal sizes of cables with minimized location-allocation cost. For designing the distribution lines in a power network, the primary factors must be considered as m...
متن کاملFuzzy Programming for Parallel Machines Scheduling: Minimizing Weighted Tardiness/Earliness and Flow Time through Genetic Algorithm
Appropriate scheduling and sequencing of tasks on machines is one of the basic and significant problems that a shop or a factory manager encounters; this is why in recent decades extensive studies have been done on scheduling issues. One type of scheduling problems is just-in-time (JIT) scheduling and in this area, motivated by JIT manufacturing, this study investigates a mathematical model for...
متن کامل